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Abstract —  Model-based  recognition  is  concerned  with  comparing  a 
shape  .4,  which  is  stored  as  a  model  for  some  particular  object,  with 
a  shape  B,  which  is  found  to  exist  in  an  image.  If  A  and  B  are  close  to 
being  the  same  shape,  then  a  vision  system  should  report  a  match  and 
return  a  measure  of  how  good  that  match  is.  To  be  useful  this  measure 
should  satisfy  a  Dumber  of  properties,  including:  1)  it  should  be  a  metric, 

2)  it  should  be  invariant  under  translation,  rotation,  and  change-of-scale, 

3)  it  should  be  reasonably  easy  to  compute,  and  4)  it  should  match  our 
intuition  (i.e..  answers  should  be  similar  to  those  that  a  person  might  give). 
We  develop  a  method  for  comparing  polygons  that  has  these  properties. 
The  method  is  based  on  the  Lj  distance  between  the  turning  functions 
of  the  two  polygons.  It  works  for  both  convex  and  nonconvex  polygons 
aod  runs  in  time  0(mn  log  mn )  where  m  is  the  number  of  vertices  in  one 
polygon  and  n  is  the  number  of  vertices  in  the  other.  We  also  present  some 
examples  to  show  that  the  method  produces  answers  that  are  intuitively 
reasonable. 

Index  Terms — Computational  geometry,  distance  metric,  model-based 
matching,  shape  comparison,  similarity  transformation,  turning  angle 
(theta-s)  representation. 


I.  Introduction 

A  PROBLEM  of  both  theoretical  and  practical  importance  in 
computer  vision  is  that  of  comparing  two  shapes.  To  what 
extent  is  shape  A  similar  to  shape  fl?  Model-based  recognition 
is  concerned  with  comparing  a  shape  A,  which  is  stored  as  a 
model  for  some  particular  object,  with  a  shape  B,  which  is  found 
to  exist  in  an  image.  If  A  and  B  are  close  to  being  of  the  same 
shape,  then  a  vision  system  should  report  a  match  and  return  a 
measure  of  how  good  that  match  is.  Hence,  we  are  interested  in 
defining  and  computing  a  cost  function  d(A,  fl)  associated  with 
two  shapes  A  and  B  that  measures  their  dissimilarity. 

The  long-term  goal  of  this  research  is  to  develop  methods  of 
comparing  arbitrary  shapes  in  two  or  three  dimensions.  Here, 
we  restrict  our  attention  to  polygonal  shapes  in  the  plane,  with 
an  extension  to  the  case  in  which  a  boundary  may  contain 
circular  arcs  in  addition  to  straight  line  segments.  Our  technique 
is  designed  to  work  with  objects  for  which  the  entire  boundaries 
are  known. 

Before  suggesting  a  measure  to  be  used  for  comparing  poly¬ 
gons,  we  examine  several  properties  that  such  a  measure  d(-,  ■) 
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should  have  (for  related  arguments  see  [7]). 

•  It  should  be  a  metric. 
d(,4.  fl)  >  0  for  all  A  and  B. 

d(A.  fl)  =  0  if  and  only  if  A  =  fl.  We  expect  a  shape  to 
resemble  itself. 

d(A.B)  =  d(B.A)  for  all  A  and  B  (Symmetry).  The  order 
of  comparison  should  not  matter. 
d(  A.  fl)  +  d(B.  C)  >  d{A.  C)  for  all  A,  B,  and  C  (Triangle 
Inequality). 

The  triangle  inequality  is  necessary  since  without  it  we  can 
have  a  case  in  which  d(A.B)  and  d(B.C)  are  both  very 
small,  but  d(A.C)  is  very  large.  This  is  undesirable  for 
pattern  matching  and  visual  recognition  applications.  If  A  is 
very  similar  to  B  and  B  is  very  similar  to  C,  then  A  and  C 
should  not  be  too  dissimilar. 

*  It  should  be  invariant  under  translation,  rotation,  and  change- 
of-scale.  In  other  words,  we  want  to  measure  shape  alone. 

*  It  should  be  reasonably  easy  to  compute.  This  must  hold  for 
the  measure  to  be  of  practical  use. 

•  Most  important  of  all,  it  should  match  our  intuitive  notions 
of  shape  resemblance.  In  other  words,  answers  should  be 
similar  to  those  that  a  human  might  give.  In  particular, 
the  measure  should  be  insensitive  to  small  perturbations  (or 
small  errors)  in  the  data.  For  example,  moving  a  vertex  by 
a  small  amount  or  breaking  a  single  edge  into  two  edges 
should  not  have  a  large  effect. 

A.  Representation  of  Polygons 

A  standard  method  of  representing  a  simple  polygon  A  is 
to  describe  its  boundary  by  giving  a  (circular)  list  of  vertices, 
expressing  each  vertex  as  a  coordinate  pair.  An  alternative 
representation  of  the  boundary  of  a  simple  polygon  A  is  to  give 
the  turning  function  6  4(a).  The  function  64(a)  measures  the 
angle  of  the  counterclockwise  tangent  as  a  function  of  the  arc- 
length  a,  measured  from  some  reference  point  O  on  A’s  boundary. 
Thus  64(0)  is  the  angle  v  that  the  tangent  at  the  reference 
point  O  makes  with  some  reference  orientation  associated  with 
the  polygon  (such  as  the  x-axis).  64(a)  keeps  track  of  the 
turning  that  takes  place,  increasing  with  left-hand  turns  and 
decreasing  with  right-hand  turns  (see  Fig.  1).  Formally,  if  *(a)  is 
the  curvature  function  for  a  curve,  then  *(a)  =  6'(a)  [13].  The 
curvature  function  n(s)  is  frequently  used  as  a  shape  signature 
[5],  [6],  [11],  [14]. 

Other  authors  have  used  a  slightly  different  definition  of 
the  turning  function  in  which  64(0)  is  defined  to  be  0.  Our 
definition,  in  which  64(0)  is  the  angle  of  the  tangent  line  at 
the  reference  point,  leads  to  a  simple  correspondence  between 
a  shift  of  64(a)  in  the  6  direction  and  a  rotation  of  A.  This 
correspondence  is  less  clear  for  the  alternate  definition. 

Without  loss  of  generality,  we  assume  that  each  polygon  is 
rescaled  so  that  the  total  perimeter  length  is  1;  hence,  64  is  a 
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Fig.  1.  Defining  the  turn  function  0(« ). 


function  from  [0. 1J  to  3f.  For  a  convex  polygon  A,  64(3)  is 
a  monotone  function,  starting  at  some  value  v  and  increasing 
to  u  +  2tt.  For  a  nonconvex  polygon,  64(a)  may  become 
arbitrarily  large,  since  it  accumulates  the  total  amount  of  turn, 
which  can  grow  as  a  polygon  “spirals”  inward.  Although  64(a) 
may  become  very  large  over  the  interval  a  €  [0, 1],  in  order  for 
the  function  to  represent  a  simple  closed  curve,  we  must  have 
64(1)  =  64(0)  +  2 it  (assuming  that  the  origin  O  is  placed  at 
a  differentiable  point  along  the  curve). 

The  domain  of  SA(s)  can  be  extended  to  the  entire  real  line 
in  a  natural  way  by  allowing  angles  to  continue  to  accumulate 
as  we  continue  around  the  perimeter  of  the  polygon  A.  Thus,  for 
a  simple  closed  polygon,  the  value  of  64(3  +  1)  is  64(a)  +  2ir 
for  all  s.  Note  that  the  function  64(a)  is  well-defined  even  for 
arbitrary  (not  necessarily  simple  or  closed  or  polygonal)  paths  A 
in  the  plane.  When  the  path  is  polygonal,  the  turning  function 
is  piecewise-constant,  with  jump  points  corresponding  to  the 
vertices  of  A. 

Representation  of  planar  curves  (and,  in  particular,  polygons) 
in  terms  of  some  function  of  arc  length  has  been  used  by  a 
number  of  other  researchers  in  computational  geometry  (e.g.,  [8], 
[11])  and  computer  vision  (cf.  (2])  We  use  this  representation  to 
compute  a  distance  function  for  comparing  two  simple  polygons 
(A  and  B)  by  looking  at  natural  notions  of  distances  between  the 
turning  functions  64(a)  and  8s(a). 

The  function  64(a)  has  several  properties  which  make  it 
especially  suitable  for  our  purposes.  It  is  piecewise-constant  for 
polygons  (and  polygonal  paths),  making  computations  particu¬ 
larly  easy  and  fast.  By  definition,  the  function  64(a)  is  invariant 
under  translation  and  scaling  of  the  polygon  A.  Rotation  of  A 
corresponds  to  a  simple  shift  of  64(a)  in  the  9  direction.  Note 
also  that  changing  the  location  of  the  origin  O  by  an  amount 
t  €  [0, 1]  along  the  perimeter  of  polygon  A  corresponds  to  a 
horizontal  shift  of  the  function  64(a)  and  is  simple  to  compute 
[the  new  turning  function  is  given  by  64(3  + 1)]. 


We  formally  define  the  distance  function  between  two  poly¬ 
gons  A  and  B  as  the  Lp  distance  between  their  two  turning 
functions  6A(s)  and  6a(s),  minimized  with  respect  to  vertical 
and  horizontal  shifts  of  the  turning  functions  (in  other  words,  we 
minimize  with  respect  to  rotation  and  choice  of  reference  points). 
For  p  »  2  we  show  that  this  distance  function  can  be  computed 
efficiently  in  0(n2  log  n)  time  for  polygons  with  n  vertices. 

One  possible  drawback  of  our  distance  function  is  that  it  may 
be  unstable  under  certain  kinds  of  noise,  in  particular,  nonuniform 
noise.  For  example  suppose  we  have  a  triangle  with  one  very 
wiggly  side,  such  as  the  one  shown  in  Fig.  2.  Comparing  it 
to  a  triangle  we  will  get  a  very  bad  (in  fact  arbitrarily  bad) 
match,  because  the  proportion  of  the  perimeter  corresponding 
to  the  sides  labeled  A  and  B  can  approach  zero.  Fortunately,  in 
many  computer  vision  applications  it  is  reasonable  to  assume  that 
the  noise  is  roughly  uniformly  distributed  over  the  sides  of  the 
polygon,  in  which  case  the  similarity  measure  we  define  performs 
nicely.  See  Section  IV  for  examples. 

Schwartz  and  Sharir  [11]  have  defined  a  notion  of  distance 
similar  to  ours.  However,  they  compute  an  approximation  based 
on  discretizing  the  turning  functions  of  the  two  shapes  into  many 
equally  spaced  points;  thus,  the  quality  of  the  approximation 
depends  on  the  number  of  points  chosen.  Our  approach,  on 
the  other  hand,  is  to  examine  the  combinatorial  complexity 
of  computing  the  exact  metric  function  between  two  polygon 
boundaries,  using  only  the  original  vertices.  Thus,  our  method 
runs  in  time  0{n2  logn)  where  n  is  the  total  number  of  polygon 
vertices,  while  their  method  computes  an  approximate  distance 
in  time  0(k  log  k),  where  k  »  n  is  the  total  number  of  inter¬ 
polation  points  used.  Furthermore,  [11]  suggest  “convexifying" 
nonconvex  polygons  in  order  to  compare  them,  as  their  method 
does  not  apply  to  nonconvex  polygons.  Our  algorithm  applies  to 
both  convex  and  nonconvex  polygons  (and  even  to  nonsimple 
polygons). 

The  remainder  of  this  paper  is  organized  as  follows.  In 
Section  II  we  give  a  formal  definition  of  the  distance  between 
two  polygons  based  on  their  turning  functions.  We  prove  that 
this  function  is  a  metric  and  show  some  of  its  properties.  These 
results  are  used  in  Section  III  to  develop  an  0(n3)  algorithm  for 
computing  the  distance  between  two  polygons,  where  n  is  the 
total  number  of  vertices;  we  then  refine  this  algorithm  to  obtain 
an  0(nJ  log  n)  running  time.  Section  IV  contains  examples  of 
the  distance  function  computed  for  several  polygons  using  an 
implementation  of  our  method.  Section  V  is  a  summary  and 
discussion  of  extensions  and  further  research. 

II.  A  Polygon  Distance  Function 

Consider  two  polygons  A  and  B  and  their  associated  turning 
functions  64(a)  and  6*(a).  The  degree  to  which  A  and  B  are 
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similar  can  be  measured  by  taking  the  distance  between  the 
functions  6U«)  and  ©a(s)  according  to  our  favorite  metric  on 
function  spaces  (e.g.,  Lp  metrics).  Define  the  Lp  distance  between 
.4  and  B  as 

<y.4.  D)  =  i|0  ,  -  ©slip  =  |©-J*)  -  ©s(s)|  pds 

where  ||  •  ||p  denotes  the  Lp  norm  [9], 

(■  has  some  undesirable  properties:  it  is  sensitive  to  both 
rotation  of  polygon  A  (or  B)  and  choice  of  reference  point  on 
the  boundary  of  A  (or  B).  Since  rotation  and  choice  of  reference 
point  are  arbitrary,  it  makes  more  sense  to  consider  the  distance 
to  be  the  minimum  over  all  such  choices.  If  we  shift  the  reference 
point  O  along  A  s  boundary  by  an  amount  r,  then  the  new  turning 
function  is  given  by  &A(s  +  t).  If  we  rotate  A  by  angle  9  then 
the  new  function  is  given  by  0.t(s)  +  9.  Thus,  we  want  to  find 
the  minimum  over  all  such  shifts  t  and  rotations  9  (i.e.,  over  all 
horizontal  and  vertical  shifts  of  04(s)).  In  other  words,  we  wan 
to  solve  for 

dJA.B)=(  min  [  |0A(s  +  f)  -  ©s(s)  +  9\pds 
'  \«e  »  f  e  ;o  r  Jn 

=  (  min  d;-bM)V. 


where 

* 

Dp  B(t.9)  =  [  |0*(a  +  t)  —  0s(s)  +  9yds. 

Jo 

Lemma  1:  dp(A.  B )  is  a  metric  for  all  p  >  0. 

Proof:  Clearly  d„( •)  is  everywhere  positive,  is  symmetric, 
and  has  the  identity  property,  because  ||  •  ||p  is  a  metric  and 
has  these  properties.  We  now  show  that  dp(-,-)  also  obeys  the 
triangle  inequality,  dp(A.B)  +  dp(B.C)  >  dp(A.C),  by  a 
straightforward  application  of  the  Minkowski  inequality  for  Lp 
metrics. 

Let  tai,  and  0ab  be  the  minimizes  of  dp(A.  B )  (i.e.,  dp(A,  B)  = 
[Dp  BUnb.9„h)\ l/P)-  Similarly  let  tbc  and  9hr  be  the  minimizes 
of  dp(B.  C ).  We  define  t '  =  („<,  +  tbc  and  9'  =  9ah  +  9bc.  Now, 
dp(A.B)  +  dp(B.C)  = 


=  f  l©4 
J  0 


( s  -f  t^b)  —  0s(a)  +  9ab\pds 


+  /  |©b($  +  tb c)  —  0c(«)  +  9bc\pds 


v: 

ii: 


i  i. 


0-t(s  +  fab  +  tbc)  —  ©b(s  +  he)  +  9ab\pds 

+  [  |0b(s  +  ff>c)  -  ©c(«)  +  9bc\pds 

.Jo 


which  by  the  Minkowski  inequality  (cf.  [9])  is 


f\eA(s 

.Jo 


a(s  +  t„i>  +  tbc)  -  ©s(»  +  he)  +  9a b 
+  ©b(s  +  tbc)  -  ©c(s)  +  0bc|',<isj 


=  /  I© 


=  D*c(t'.9'). 


A(s  +  t')-ec(s)  +  9'\pds 


Fig.  3.  The  rectangular  strips  formed  by  the  functions  0  A (.« j  and  Ogi  sl 
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Clearly  Dp  c(t'.9')  >  dp(A.C)  because  the  latter  is  minimized 
over  all  values  of  r  and  9  including  t'  and  9'.  □ 

Lemma  2:  For  any  fixed  value  of  r.  and  for  any  pa  1, 
(f.  9)  is  a  convex  function  of  9. 

Proof:  For  fixed  s  and  fixed  r,  the  function  F(9)  = 
|©  A (s  +  t)  -  ©b(s)  +  9\p  is  clearly  a  convex  function  by  the 
convexity  of  G(y)  =  |j/|p  (for  p  *  1),  and  integrating  a  convex 
function  maintains  convexity  (i.e.,  if  a  positive  valued  function 
'y(i.y)  is  convex  in  y  for  fixed  x,  then  f  ~i{x.y)dx  is  also  a 
convex  function  of  y).  □ 

In  particular,  using  the  L2  metric,  DA  B{t.9)  is  a  quadratic 
function  of  9  for  any  fixed  value  of  r.  This  holds  for  any  closed 
shapes,  but  it  is  especially  easy  to  see  for  polygons. 

We  assume  from  now  on  that  A  and  B  are  polygons.  Then, 
for  a  fixed  r,  the  integral  fQl  |©4(s  -ft)  -  ©a(s)  +  9\2ds  can 
be  computed  by  adding  up  the  value  of  the  integral  within  each 
strip  defined  by  a  consecutive  pair  of  discontinuities  in  ©A(s) 
and  ©s(s)  (see  Fig.  3).  The  integral  within  a  strip  is  trivially 
computed  as  the  width  of  the  strip  times  the  square  of  the 
difference  |©a(s  +  f)  -  ©b(*)I  (which  is  constant  within  each 
strip).  Note  that  if  m  and  n  are  the  numbers  of  the  vertices  in 
A  and  B,  respectively,  then  there  are  m  +  n  strips  and  that  as  9 
changes,  the  value  of  the  integral  for  each  strip  is  a  quadratic 
function  of  9. 

In  order  to  compute  dj(A,  B),  we  must  minimize  Dj'B(t.9) 
over  all  t  and  9.  We  begin  by  finding  the  optimal  9  for  any  fixed 
value  of  t.  To  simplify  notation  in  the  following  discussion,  we 
use  f(s)  =  ©A(s).  g(s)  =  ©b(«).  and  h(t.9)  =  DA  B(t.9). 

Lemma  3:  Let  h(t.9)  =  fgl  (f(s  At)  -  g(s)  +  9)2ds.  Then,  in 
order  to  minimize  h(t,9),  the  best  value  of  9  is  given  by 


9‘(t) 


-[ 


A  SJa 


{g{s)~  }{s  +  t))ds 


=  a  -  2 irt. 


where  a  =  Jgg{s)ds  -  Jg  f(s)ds. 
Proof: 


"T  -  a*I 

DTIC  T>.H 
UriEL-Luc  vr'et-d 
Juatt/iCucio 


dh(t,9) 

89 


■r 


(: 29  +  2f(s  +  t)-2g(s))(U 


■i: 


=  29  +  2  /  ( f(s  +  t)  -  g(s))ds 


By - _ 

Distribution/ 


Lemma  2  assures  us  that  the  minimum  occurs  when  we  set  this 
quantity  equal  to  zero  and  solve  for  9.  Thus, 

9'(t)  =  f  (g(s)  -  f(s  +  t))ds. 

J  0 
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Now,  /0l  f(s  +  t)  ds 


r **» 

=  I  f(s)ds 

=  J  f(s)  ds  +  J'  [f(s  -  1)  +  27r]  da 


f(s)ds+  I  f(s)ds  +2vt 


=  j'  f(s)ds  +  J 

=  2tt t  +  /  /(«)  ds 
Jo 


Thus, 


g(s)da  -  2-irt  - 


=  a  -  2 jr£. 


□ 

Substituting  the  expression  for  0*(t)  in  d2(A,B)  we  are  left 
with  a  one-variable  minimization  problem. 


d2{A.  B)  = 


|  min  [jf  ([/(a  + 1)  -  ^(s)]2  -  [0* (t)]2  +  20* (t) 

•[/(s  +  f)-«7(s)])<fe|  j 

{min  [|  [/(a  +  f)  -  </(*)]’<*»  -  [O'WJ1]  }*.  (1) 
0 


Proo/:  We  give  a  geometric  proof.  First  recall  that  for  a 
given  value  of  t  the  discontinuities  in  /  and  g  define  a  set  of 
m  *  n  rectangular  strips  (see  Fig.  3).  The  value  of  h(t.O)  is 
simply  the  sum  over  all  these  strips  of  the  width  of  a  strip  times 
the  square  of  its  height.  Except  at  critical  events,  as  /  is  shifted 
the  width  of  each  strip  changes,  but  the  height  remains  constant. 
Each  changing  rectangle  contributes  to  changes  in  h(t.  9).  If  /  is 
the  amount  of  shift,  then  for  a  shrinking  rectangle,  the  change  is 
(-  r)  times  the  square  of  the  height;  for  a  growing  rectangle  the 
change  is  (+  t)  times  the  square  of  the  height.  Since  the  heights 
are  constant,  the  change  in  h(t.  9)  is  a  sum  of  linear  terms  and 
is  therefore  linear.  Breakpoints  in  h(t.  9)  clearly  occur  at  each  of 
the  mn  critical  events  where  a  discontinuity  of  /  is  aligned  with 
a  discontinuity  of  g.  □ 

This  result  leads  to  a  straightforward  algorithm  for  computing 
di(A.  B).  Let  ( t’.9 *)  be  the  location  of  the  minimum  value  of 
h\t,0).  By  the  preceding  lemma,  h{t.O’)  is  piecewise-linear  as 
a  function  of  r  with  breakpoints  among  a  fixed  set  of  critical 
values;  thus,  r*  must  be  at  one  of  the  critical  values.  Now, 
h(t,0’(t))  =  h(t.O)  -  [0*(t)]2  =  h(t.O)  -  [a-2irt}2  [from 
(1)],  so  it  suffices  to  evaluate  h(t.O)  =  /  [f(s  +  t)  - 
at  critical  values  of  r. 

Corollary  S:  The  distance  d2(A.  B)  between  two  polygons  A 
and  B  (with  m  and  n  vertices)  can  be  computed  exactly  in  time 
0(mn(m  +  n)). 

Proof:  For  given  values  of  t  and  9,  h(t.  9)  can  be  computed 
in  0(m  +  n)  time  by  adding  the  contributions  of  the  m  +  n 
rectangular  strips  between  /  and  g.  Let  c<,,  ct.  ■  ■  ■  ,cm„  be  the 
critical  events  that  occur  as  /  is  shifted  by  r.  By  the  preceding 
observations,  the  minimum  of  h(t,9)  occurs  when  t  equals 
one  of  Co,Ci,  --,cmn.  Since  the  best  9  value  for  a  given  r 
[namely,  0*(t)]  can  be  found  in  constant  time  (Lemma  3),  we 
simply  compute  h(t,  9'(t))  in  0(m  +  n)  time  for  each  of  these 
critical  events,  find  the  minimum,  and  take  its  square  root  to  get 
d*(A,fl).  □ 


III.  Algorithmic  Details 

In  this  section  we  show  that  the  distance  function  achieves  its 
minimum  at  one  of  mn  discrete  points  on  [0, 1],  which  we  call 
critical  events.  Recall  that  in  the  process  of  finding  dj(A,  B)  we 
have  to  shift  the  function  f(s)  to  f(s  +  r)  for  t  €  [0, 1],  During  this 
shifting  operation  the  breakpoints  of  / collide  with  the  breakpoints 
of  g.  We  define  a  critical  event  as  a  value  of  t  where  a  breakpoint 
of  /  collides  with  a  breakpoint  of  g.  Gearly  there  are  mn  such 
critical  events  for  m  breakpoints  in  /  and  n  breakpoints  in  g. 

Using  the  fact  that  the  minimum  is  obtained  at  a  critical  event, 
we  present  a  basic  algorithm  for  computing  dj(A,  B)  that  runs  in 
0(n3)  time  for  two  /i-vertex  polygons  (and  time  0(mn(m  +  n)) 
for  an  m  vertex  polygon  and  an  n  vertex  polygon).  We  then  de¬ 
scribe  how  to  modify  the  basic  method  to  improve  the  runtime  to 
0(n2  log  n)  (or  0(mn  log  mn)  for  unequal  numbers  of  vertices). 

Recall  that  d2(A,B)  =  mmt  t(h(t,0))K  where  h{t,9)  = 
D2  B(t,8).  We  prove  that  h(t,9)  has  properties  that  lead  to 
efficient  algorithms  for  computing  our  polygon  metric. 

Lemma  4:  If  /(-)  and  g()  are  two  piecewise -constant  functions 
with  m  and  n  breakpoints,  respectively,  then  for  constant  9, 

h(U)=  f  (/(a  + 1)  —  g(»)  +  Bfds 
Jo 

is  piecewise-linear  as  a  function  of  r,  with  mn  breakpoints  which 
are  independent  of  the  value  9. 


A.  Refinement  of  the  Algorithm 

The  above  time  bound  can  be  improved  by  using  a  somewhat 
more  complex  algorithm. 

Theorem  6:  The  distance  &i(A.  B )  between  two  polygons  A 
and  B  (with  m  and  n  vertices)  can  be  computed  exactly  in  time 
0(mn  log  mn). 

Proof:  We  prove  the  theorem  by  describing  the  algorithm. 
The  basic  idea  is  the  same  as  the  previous  algorithm:  we 
compute  h(t,9'(t))  for  each  of  the  critical  values  of  t.  By  the 
comments  before  Corollary  S,  it  suffices  to  evaluate  h(t.O)  = 
/  [/(*  + 1)  -  g(s)]2ds  at  critical  values  of  r.  Now  we  observe 
that  h(t,0)  varies  with  t  in  a  very  constrained  fashion.  As  a 
matter  of  fact,  by  keeping  track  of  a  small  set  of  values  we 
can  easily  determine  how  the  function  h(t,0)  changes  at  each 
critical  event. 

The  values  we  keep  track  of  are  based  on  the  rectangular  strips 
that  appear  between  the  two  functions  f(s)  and  g(s).  Recall  that 
g(s)  is  fixed  in  place  and  that  f(s)  is  shifted  backwards  by  t. 
For  a  given  value  of  r,  the  discontinuities  in  f(s  *  t)  and  g(s) 
define  a  set  of  rectangular  strips,  as  was  illustrated  in  Fig.  3. 
Each  rectangular  strip  has  /  at  the  top  and  g  at  the  bottom  or 
vice  versa.  The  side  of  a  strip  are  determined  by  discontinuities 
in  /  and  g. 

For  the  purposes  of  the  algorithm,  we  separate  the  strips  into 
four  groups  based  on  the  discontinuities  at  the  sides  of  the  strips: 
Rg  for  those  with  /  on  both  sides;  R„  for  those  with  g  on  both 
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sides;  Rf,  for  those  with  /  on  the  left  and  g  on  the  right;  and 
n  for  those  with  g  on  the  left  and  /  on  the  right.  The  sets  Rf, 
and  arc  particularly  important,  as  these  are  the  strips  whose 
widths  change  as  t  changes  (as  /  is  shifted).  Thus,  these  strips 
affect  the  slope  of  h(t.O). 

We  keep  track  of  two  quantities;  Hf,  and  H#.  Hf%  is  the  sum 
of  the  squares  of  all  the  heights  of  all  the  strips  in  Rf ,,  and  H# 
is  the  sum  of  the  squares  of  the  heights  of  all  the  strips  in  Ra. 

The  algorithm  is  based  on  the  observation  that  for  values  of  t 
between  two  critical  events  the  slope  of  h(t,  0)  is  Hf,  -  H#.  This 
follows  from  the  fact  that,  as  /  is  shifted  backwards  by  r,  Rf,  is 
the  set  of  all  strips  that  increase  in  width  by  f,  and  R #  is  the  set 
of  all  strips  that  decrease  in  width  by  r.  The  widths  of  the  Rg 
and  Rgg  strips  remain  unchanged. 

Consider  what  happens  at  one  of  the  critical  events,  where 
the  change  is  no  longer  simply  linear.  We  claim  that  quantities 
Hfg  and  H#  can  be  easily  updated  at  these  points.  To  see  this 
note  that,  at  a  critical  event,  a  gf-type  strip  disappears  (its  width 
goes  to  zero)  and  a  new  fg- type  strip  appears  (see  Fig.  3).  At  the 
same  time,  the  right  boundary  of  the  adjacent  strip  to  the  left  is 
converted  from  g  to  /,  and  the  left  boundary  of  the  adjacent  strip 
to  the  right  is  converted  from  /  to  g.  To  update  Hf,  and  H#  we 
need  to  know  just  the  values  of  /  and  g  around  the  critical  event. 

This  gives  us  the  following  algorithm. 

1)  Initialize: 

•  Given  the  piecewise-constant  functions  /  and  g,  deter¬ 
mine  the  critical  events:  the  shifts  of  /  by  r  such  that 
a  discontinuity  in  /  coincides  with  a  discontinuity  in  g. 
Sort  these  critical  events  by  how  far  /  must  be  shifted 
for  each  event  to  occur.  Let  Co.  ct.  •  •  • .  c«  be  the  ordered 
list  of  shifts  for  the  critical  events  c0  *  0. 

•  Calculate  h( 0, 0).  This  involves  summing  the  contribu¬ 
tions  of  each  ot  m  +  n  strips  and  takes  linear  time. 

•  Determine  initial  values  for  Hf,  and  H#. 

2)  For  i  =  1  to  e 

•  Determine  the  value  of 

h(c,.0)  =  (Hf3-  Hgf)(c,  -  c,_i)  +  A(c,_ i,0). 

•  Update  Hf,  and  H#. 

The  algorithm  takes  advantage  of  the  fact  that  h(t,0)  is 
piecewise-linear  as  a  function  of  r,  thus,  the  entire  function 
can  be  determined  once  we  know  an  initial  value  and  the 
slope  for  each  piece.  It  is  easy  to  see  that  the  time  for 
initialization  is  dominated  by  the  time  it  takes  to  sort  the 
critical  events:  0(e  loge),  where  e  is  the  number  of  critical 
events,  or  0(mn  log  mn)  where  m  and  n  are  the  sizes  of 
the  two  polygons.  The  updates  required  for  the  remainder 
of  the  algorithm  take  a  total  of  0(e),  or  0{mn)  time. 

□ 

In  practice,  it  might  be  useful  to  recalculate  h(t,  0)  periodically 
from  scratch  to  avoid  errors  that  could  accumulate.  If  this  is  done 
every  0{  )  steps  then  the  time  bound  for  the  entire  algorithm 

remains  0{e loge). 

IV.  Examples 

In  this  section  we  illustrate  some  of  the  qualitative  aspects 
of  the  distance  function  d,(A,B )  by  comparing  some  simple 
polygons  using  the  algorithm  described  in  the  previous  section.  In 
addition  to  providing  a  distance  <fj(A,  B),  between  two  polygonal 


Fig.  4.  Comparing  two  simple  polygons. 


shapes,  the  method  gives  the  relative  orientation  0*  and  the 
corresponding  reference  points  of  the  two  polygons  for  which 
this  distance  is  attained. 

The  first  example  compares  two  simple  polygons  that  are 
very  similar  in  shape,  but  which  are  at  different  orientations 
(see  Fig.  4).  The  value  of  d2(.4,  B)  is  0.144  which  is  attained 
at  a  rotation  of  180  degrees  and  with  the  upper  left  vertex  of 
the  first  polygon  matched  with  the  lower  right  vertex  of  the 
second  one.  (Distances  less  than  about  0.5  seem  to  correspond 
to  polygons  that  a  person  would  rate  as  resembling  each  other; 
pairs  of  polygons  that  are  very  different  can  have  arbitrarily  high 
distances.) 

To  illustrate  how  the  distance  function  can  be  used  to  compare 
a  model  with  several  different  instances,  we  consider  the  eight 
shapes  illustrated  in  Fig.  5(a)  and  5(b).  In  Fig.  5(a)  the  shapes 
are  ordered  by  their  distance  from  the  square;  in  5(b)  the  same 
shapes  are  ordered  by  their  distance  from  the  triangle.  The  order 
of  the  shapes  corresponds  remarkably  well  to  our  intuitive  idea 
of  shape-resemblance.  The  match  to  the  cutoff  triangles  suggests 
that  the  metric  is  useful  for  matching  partially  occluded  objects, 
as  long  as  the  overall  shape  of  the  object  does  not  change  too 
radically. 

Our  metric  also  provides  a  qualitatively  good  estimate  of  a 
match  when  one  polygon  is  an  instance  of  another,  but  with 
some  perturbation  of  its  boundary.  A  simple  example  is  given 
by  the  cutoff  triangle  in  Fig.  5.  Another  example  is  given  in 
Fig.  6,  where  we  compare  a  model  rectangle  against  another 
rectangle  with  a  notch  removed.  The  distance  is  0.327  with  a 
relative  orientation  of  179  degrees. 

An  extreme  case  of  matching  distorted  polygons  is  shown  in 
Fig.  7,  where  a  triangle  is  compared  with  a  somewhat  triangular 
shape.  In  this  case  the  distance  is  0.834,  and  the  orientation 
difference  is  16  degrees.  Note  however,  that,  as  mentioned  in  the 
introduction  (see  Fig.  2),  such  perturbations  must  occur  relatively 
uniformly  along  the  perimeter  of  the  polygon  for  the  match  to 
be  reasonable.  (A  smoothing  technique  is  likely  to  alleviate  the 
problem  of  nonuniform  perturbations.) 

V.  Summary  and  Discussion 

We  have  suggested  using  the  Lj  metric  on  the  turning  functions 
of  polygons  as  a  way  to  implement  the  intuitive  notion  of 
shape-resemblance.  This  method  for  comparing  shapes  has  the 
following  advantages: 

*  It  is  a  metric  on  polygonal  shapes. 

*  It  compares  shape  alone;  it  is  invariant  under  translation, 
rotation,  and  change -of-scale. 

*  It  is  reasonably  easy  to  compute,  taking  time  0(ron  log  mn) 
to  compare  an  m  vertex  polygon  against  an  n  vertex  polygon. 

*  Finally,  it  corresponds  well  to  intuitive  notions  of  shape 
resemblance. 
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Fig.  5.  Comparing  several  polygons. 


Fig.  7.  Matching  a  triangle  to  a  highly  noisy  shape. 


In  addition,  this  metric  works  for  nonconvex  as  well  as  convex 
polygons,  and  even  works  for  polygonal  shapes  that  are  not 
simple. 

A  number  of  other  authors  have  considered  the  problem  of 
determining  the  extent  to  which  one  shape  resembles  another 
(e.g.,  [1],  [3],  [S],  [8],  [10],  [11]).  In  contrast  to  our  distance 


function,  these  methods  either  are  not  metrics,  do  not  compare 
shapes  independent  of  position,  orientation  and  scale,  or  are  not 
efficient  to  compute.  The  most  similar  method  to  ours  is  that 
of  Schwartz  and  Sharir  [11],  [12].  Huttenlocher  and  Kedem  [4] 
have  recently  developed  a  metric  for  comparing  polygonal  shapes 
independently  of  an  affine  transformation,  thus  extending  our 
results  to  a  more  general  class  of  shape  transformations.  Their 
method  computes  a  distance  based  on  the  Hausdorff  metric,  and 
also  runs  in  time  0(mn  log  mn). 

Like  the  method  of  [11],  [12],  our  method  is  actually  based  on 
a  convolution.  Recall  that  the  major  portion  of  our  algorithm  is 
devoted  to  minimizing  h(t.9)  =  fg'  ( f(s  +  t)  -  g(s)  +  9yds. 
When  this  formula  is  multiplied  out,  all  the  terms  depend 
on  /  alone  or  g  alone,  except  for  the  convolution  term 
f0  f(s  +  t)g(s)  ds.  If  / and  g  are  piecewise-constant  with  m  and 
n  discontinuities,  respectively,  then  each  term  can  be  calculated 
in  either  0(m )  or  0(n)  time  except  for  the  convolution  term, 
which  seems  to  require  0{mn  log  mn)  time.  Of  course,  the  fast 
Fourier  transform  (FFT)  can  be  used  to  compute  a  convolution  in 
0(k  log  k)  time,  but  this  requires  k  evenly  spaced  sample  points 
for  each  of  /and  g.  For  our  problem,  the  discontinuities  are  not 
necessarily  evenly  spaced,  so  the  FFT  cannot  be  used  unless 
we  are  willing  to  approximate  our  functions  /  and  g.  A  good 
approximation  may  require  more  than  mn  points.  (Schwartz  and 
Sharir  [11]  avoid  these  discontinuities  by  rotating  the  turning 
functions.  This  makes  it  possible  to  use  the  FFT,  although  it 
restricts  their  method  to  convex  polygons.)  In  any  case,  the 
development  of  a  fast  method  for  convolutions  using  unevenly 
spaced  sample  points  would  lead  to  improvements  in  the  time 
bound  for  our  technique. 

We  used  the  L2  metric,  but  similar  techniques  can  be  used  to 
develop  polygon-resemblance  metrics  that  are  based  on  different 
function-space  metrics.  Unfortunately,  not  all  such  metrics  have 
L{s  advantages  of  being  reasonably  easy  to  compute  and  match¬ 
ing  our  intuitive  idea  of  shape  resemblance.  For  instance,  it  is 
also  possible  to  compute  the  L,  metric  on  two  8(s)  functions 
using  an  algorithm  similar  to  that  in  Section  HI.  In  the  case  of 
the  L\  metric,  however,  the  value  of  6*  is  not  given  directly 
for  each  value  of  r  as  it  is  for  the  Lz  metric.  Thus  for  each  of 
the  mn  critical  events,  the  optimal  value  of  9  must  be  computed 
explicitly.  Using  a  data  structure  similar  to  that  in  Section  III,  the 
overall  computation  can  be  done  in  time  0(nJ  log  n),  as  opposed 
to  0(n*  logn)  for  the  Li  metric. 

The  L\  metric  has  an  additional  drawback:  the  optimal  match 
will  occur  when  one  side  of  polygon  A  is  at  the  same  orientation 
as  some  side  of  polygon  B.  This  is  because  D*  B{t,0)  is  a 
piecewise  linear  in  both  r  and  9,  so  the  minimum  occurs  at  a 
critical  event  in  9  as  well  as  at  a  critical  event  in  r.  In  contrast, 
the  Li  metric  finds  the  optimal  orientation  (in  a  least  squares 
sense)  without  requiring  any  two  edges  to  be  identically  oriented. 
Examine  Fig.  8  to  see  why  requiring  identical  orientations  can 
be  undesirable;  for  the  L\  metric  the  best  match  occurs  at 
an  orientation  difference  of  76  degrees,  bringing  two  edges 
into  alignment.  This  would  rotate  the  two  figures  so  that  they 
approximately  form  a  star,  a  bad  match.  In  contrast,  for  the 
metric  the  best  match  is  at  an  orientation  difference  of  7  degrees, 
which  agrees  quite  well  with  our  intuitive  sense  of  the  best  match. 

It  may  be  possible  to  apply  our  methods  to  problems  involving 
partially  occluded  objects,  that  is  objects  for  which  the  entire 
model  is  known,  but  for  which  only  a  portion  of  the  boundary 
appears  in  the  image.  Our  technique  as  presented  here  has  not 
been  designed  to  work  with  such  objects,  although,  as  shown  by 
some  of  our  examples,  it  seems  to  give  intuitively  correct  answers 
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Fig.  8.  A  match  for  which  the  L\  metric  does  poorly  but  the  L;  does  well. 

when  objects  are  not  severely  occluded.  The  combination  of 
occluded  objects  and  our  desire  to  make  our  metric  independent 
of  change-of-scale  causes  some  difficulty.  We  were  able  to 
control  change-of-scale  problems  by  normalizing  our  polygons 
to  make  all  perimeters  have  ltngth  one.  If  portions  of  a  boundary 
are  unknown  then  it  is  unclear  how  this  normalization  should  be 
done.  If  the  scale  of  the  image  is  known,  then  partially  occluded 
objects  should  not  present  any  severe  difficulties. 

Our  results  can  be  generalized  to  include  cases  in  which  the 
shapes  A  and  B  have  some  or  all  of  their  boundary  represented 
as  circular  arcs.  If  A  includes  some  circular  arcs  on  its  boundary, 
then  the  turning  function  0a(s)  is  piecewise-linear  instead  of 
piecewise-constant.  As  before,  to  compare  shapes  A  and  B  we 
need  to  minimize 

h(t.9)=  f  \eA(s  +  t)-eB(s)  +  o\2ds. 

Jo 

The  derivation  of  9'(t)  does  not  change  at  all,  because  h(t.d)  is 
still  a  quadratic  function  of  9.  But  if  the  shapes  include  circular 
arcs,  then  h  can  be  piecewise-cubic  as  a  function  of  t  for  fixed 
9  (instead  of  simply  piecewise-linear).  Thus,  the  minimum  value 
of  h(t.  9)  does  not  necessarily  occur  at  a  critical  event,  and  more 
information  is  needed  in  order  to  determine  the  behavior  of  h 
between  critical  events. 

However,  since  h(t.9’(t))  is  piecewise-cubic  as  a  function 
of  r,  we  can  determine  the  behavior  of  h  between  critical 
events  if  we  have  enough  data  points  for  h.  If  we  collect 
two  additional  (t.h{t.9’(t)))  pairs  between  each  adjacent  pair 
of  critical  events  then  we  can  determine  the  coefficients  of 
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